# High-Level Synthesis Performance Prediction using GNNs: Benchmarking, Modeling, and Advancing

Nan Wu nanwu@ucsb.edu University of California, Santa Barbara Santa Barbara, CA, USA Hang Yang innallyyang@hotmail.com Nankai University Tianjin, China

Yuan Xie yuanxie@ucsb.edu University of California, Santa Barbara Santa Barbara, CA, USA

### Pan Li

panli@purdue.edu Purdue University West Lafayette, IN, USA

# **Abstract**

Agile hardware development requires fast and accurate circuit quality evaluation from early design stages. Existing work of high-level synthesis (HLS) performance prediction usually needs extensive feature engineering after the synthesis process. To expedite circuit evaluation from as earlier design stage as possible, we propose a rapid and accurate performance modeling, exploiting the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs. The contribution of this work is threefold. First, we build a standard benchmark containing 40k C synthesizable programs, which includes both synthetic programs and three sets of real-world HLS benchmarks. Each program is implemented on FPGA to generate ground-truth performance metrics. Second, we formally formulate the HLS performance prediction problem on graphs, and propose multiple modeling strategies with GNNs that leverage different trade-offs between prediction timeliness (early/late prediction) and accuracy. Third, we further propose a novel hierarchical GNN that does not sacrifice timeliness but largely improves prediction accuracy, significantly outperforming HLS tools. We apply extensive evaluations for both synthetic and unseen real-case programs; our proposed predictor largely outperforms HLS by up to 40× and excels existing predictors by  $2 \times$  to  $5 \times$  in terms of resource usage and timing prediction.

# 1 Introduction

One essential requirement for agile hardware development is to evaluate circuit design quality quickly and accurately for rapid optimization iterations. Traditional EDA tools usually take hours to days to accurately evaluate circuit quality with extensive manual efforts. Although high-level synthesis (HLS) tools can greatly speed up circuit design, they still need minutes to hours for design synthesis, and can be largely inaccurate in terms of circuit quality evaluation [28]. Given the strong need for hardware agile development and productivity boost, a quick and accurate performance evaluation at earliest stage, even before HLS, is highly expected.

# Cong Hao

callie.hao@ece.gatech.edu Georgia Institute of Technology Atlanta, GA, USA



**Figure 1.** The overall performance prediction flow. (a) Design flow starting from behavioral programs to hardware circuits. (b) An example program written in C. (c) The intermediate representation (IR) graph extracted by compiler front-ends. (d) The working flow of GNNs, predicting *actual* resource usage and timing merely based on raw IR graphs.

Prior work has investigated circuit performance evaluation before or after HLS, to predict synthesized or implemented design metrics such as resource usage, timing, power, and area. Analytical models are classic approaches [19, 32, 33] but they only work for highly regular dataflow such as perfect loops and arrays. Recent ML approaches have become promising in estimating the actual design performance [29]. Pyramid [15] assembled multiple ML models for resource and timing prediction. Both HLSPredict [18] and XPPE [16] are ANN-based cross-platform performance predictors that estimate the HLS design performance on FPGAs.

Despite the great success, most of the ML-based methods rely on intensive and empirical feature engineering: a large number of features must be obtained from HLS synthesis report or the intermediate results of a partially executed implementation process, which is still time-consuming. Therefore, in this work, we aim to approach HLS performance prediction at its earliest stage with least features right after front-end compilation. Since programs are usually represented as intermediate representation (IR) graphs at early stage, we exploit the representation power of graph neural networks (GNNs) and adopt various GNNs for timely performance prediction based on IR graphs. Fig. 1 shows the overall prediction flow: we extract IR graphs right after HLS front-end compilation, and directly predict the actual circuit performance that are expected to be obtained after implementation.

To comprehensively investigate this problem, we propose prediction algorithms at different stages of HLS and discuss their trade-offs of prediction accuracy and efficiency: the earlier the prediction is, the more beneficial for agile design but the less information. We then propose a novel hierarchical GNN-based predictor with great trade-off, which can *predict at earliest stage but still with sufficient domain-specific information*. Further, to benefit follow-up researches, we standardize the problem formulation and develop a rich benchmark suite. We summarize our contributions as follows:

- Benchmarking. We build a standard benchmark containing 40k C programs, each with an IR graph. The programs are synthesized by HLS tool and implemented on FPGA implementation to get their actual resource and performance. Three sets of real-world benchmarks are included for generalization evaluation.
- Modeling. To study the trade-offs of prediction accuracy
  and efficiency, on IR graphs, we first propose two approaches using GNNs: (1) off-the-shelf approach at earliest
  stage with least domain-specific information; (2) knowledgerich approach at later stage with HLS auxiliary information
  to improve prediction accuracy. We aim to provide domain
  insights for future GNN design.
- Advancing. We propose the third knowledge-infused approach, a novel hierarchical GNN, that inherits both advantages of earliest prediction as well as domain knowledge. The model is composed of a node-level classification task and a graph-level regression task, which first classifies the resource type and then regresses the resource usage values. It largely improves the prediction accuracy with zero overhead at inference time.
- **Evaluation**. We apply extensive evaluations for both synthetic and *unseen* real-case programs; our proposed predictor largely outperforms HLS by up to 40× and excels existing predictor IronMan [28] by 2× to 5× in terms of resource usage and timing prediction.

## 2 Performance Prediction Strategies

There are two fundamental questions for performance prediction: **when** and **how**.

When to predict. Performance prediction for a circuit design can be conducted at different stages in the synthesis flow, e.g., before or after HLS or during implementation. For

instance, HLS tools take in behavioral language, convert to RTL, and produce a synthesis report, which is an early estimation of the final performance, as shown in Fig. 1. However, it is observed that HLS prediction (synthesis report) can be largely inaccurate [23, 28] even after minutes or hours of synthesis. Both Pyramid [15] and XPPE [16] make predictions for resource usage and timing after HLS by extracting features from HLS synthesis reports, while HLSPredict [18] predicts FPGA cycle count and power before HLS. Typically, an early and timely prediction is beneficial for agile development, but with less useful domain-specific knowledge.

How to predict. Existing ML-based prediction approaches attempted linear regression, artificial neural network, support vector machine, random forest, Lasso, and assembled models. Although promising, these models require rich features as model inputs and thus heavy feature engineering is needed. For instance, Pyramid/XPPE and HLSPredict requires up to 183 and 75 features, respectively, which can only be obtained by actually running HLS or CPU/FPGA subtrace generation. Therefore, these strategies do not easily generalize to new designs rapidly.

Our prediction strategy. Our goal is to assist *agile hard-ware development* by making the performance prediction *as early as possible* and also *as accurate as possible*. Specifically, we focus on predicting the *actual* design performance values on FPGA, including resource utilization and critical path timing.

To address the existing limitations (late prediction, hard to generalize) and to meet our goal, we propose three early prediction approaches, each with different amount of domain-specific knowledge leading to varied prediction accuracy. Fig. 2 illustrates the three approaches. To make it *timely*, we perform prediction based on the intermediate representation (IR) graph of a program, i.e., data flow graph (DFG) and control data flow graph (CDFG), which can be quickly extracted after the front-end compilation [1, 26] within seconds. To make it *generalizable*, we propose to apply graph neural networks (GNNs) on DFGs/CDFGs, exploiting the inductiveness of GNNs to make predictions for completely unseen designs without retraining. Specifically, our three approaches are:

- Off-the-shelf approach. The first approach makes prediction
  at the earliest stage by taking the IR graph (DFG/CDFG)
  as the GNN model input, and directly predicts the design
  performance metrics. The features can be obtained after
  HLS front-end compilation, resulting in fastest prediction
  since the compilation usually takes only a few seconds.
- Knowledge-rich approach. The second approach aims to make more accurate prediction by taking auxiliary domain information from intermediate HLS results: the resource usage associated with each node. The features must be obtained during HLS execution, resulting in late but more more accurate prediction.



**Figure 2.** Our three proposed approaches: (a) off-the-shelf approach at the earliest stage for prediction; (b) knowledge-infused approach also at the earliest stage but with self-inferred domain-specif information; (c) knowledge-rich approach.

• Knowledge-infused approach. The third approach is a novel hierarchical GNN that possesses the advantages of both the first and second: it not only makes earliest prediction but also can infuse self-inferred domain-specific knowledge with almost zero overhead during inference. Specifically, it takes the IR graph as inputs and makes predictions in two steps: node-level classification and graph-level regression.

The approach details are introduced in Section 4. Prior to that, we introduce our benchmark suite first in Section 3.

# 3 Benchmarking

The goal of benchmarking is to facilitate more ML related researches to promote rapid performance prediction by providing abundant synthesizable programs together with *actual* performance values on FPGA, i.e., after implementation.

#### 3.1 Benchmark Format

**Input.** We let the inputs of a predictor to be the IR graph of a program, which can be quickly extracted after the frontend compilation [1, 26]. In HLS, DFGs and CDFGs are the most common IR graphs. Specifically, DFGs are extracted from from *basic blocks*, a straight-line code sequence with no branches in except to the entry and no branches out except at the exit [11]; CDFGs are extracted from programs with loops. DFGs are directed acyclic graphs without any structural loops, while CDFGs contain additional nodes and edges/loops for control dependency.

**Node/Edge Features**. The three proposed approaches use different sets of node features, as listed in Table 1. After HLS front-end compilation, there are seven features immediately available for each node to be used by the first off-the-shelf approach, such as node category, bitwidth, and opcode. For the knowledge-infused and knowledge-rich approaches, we include the resource type and resource value, respectively, for each node as node features. Notably, for the knowledge-infused approach, the auxiliary node features (i.e., resource type) are only used during training but not inference. Each

edge has two features, the discrete edge type in integers, and a binary signal marking whether this edge is a back edge. **Tasks/labels.** We provide two types of tasks, a *node-level classification* task, and a *graph-level regression* task, where the former is easier than the later.

- For the *node-level classification* task, we assign each node in the DFG/CDFG a label indicating the resource type(s) that the node will use in its final implementation. We consider three resource categories: DSP, LUT, and FF. A node can be implemented by zero, one, or more resource types. For example, a sdiv node may use both DSP and LUT; a partselect node uses FF only; a node uses nothing if it is for control, e.g., indicating a branch entry (br). We organize the resource type prediction as three binary classification tasks. If a node falls into none of the three, it is regarded as empty, i.e., not associated with any resource.
- For the graph-level regression task, we label the entire graph using its implemented performance metric values. We consider four metrics for regression: DSP, FF, LUT, CP. The first three are integer numbers indicating how much these resources are used; the last one is critical path timing slack in fractional number, determining the FPGA's maximum working frequency.

### 3.2 Benchmark Generation.

We construct the benchmark suite including synthetic and synthesizable C programs as well as real-world HLS applications. The synthetic programs fall into two categories, basic blocks that derive DFGs, and programs with control loops and branches that derive CDFGs. All of the synthetic programs are generated by a C program generator ldrgen [2]. There are 19,120 and 18,570 C programs in the DFG and CDFG datasets, respectively, for graph-level tasks. The node-level dataset contains more than 660k nodes derived from DFGs and CDFGs. In addition, there are three sets of real-world HLS applications: MachSuite [21], CHStone [10], and PolyBench/C [20], consisted of 16, 10, and 30 different

**Table 1.** Node features and example values.

| Feature                                         | Description                                     | Values                                             |  |  |  |  |  |  |
|-------------------------------------------------|-------------------------------------------------|----------------------------------------------------|--|--|--|--|--|--|
| Off-the-shelf approach with minimum information |                                                 |                                                    |  |  |  |  |  |  |
| Node type                                       | General node type                               | operation nodes, blocks, ports, misc               |  |  |  |  |  |  |
| Bitwidth                                        | Bitwidth of the node                            | 0~256, misc                                        |  |  |  |  |  |  |
| Opcode type                                     | Opcode categories<br>based on LLVM              | <pre>binary_unary, bitwise,     memory, etc.</pre> |  |  |  |  |  |  |
| Opcode                                          | Opcode of the node                              | load, add, mux, xor, icmp, etc.                    |  |  |  |  |  |  |
| Is start of path                                | Whether the node is the starting node of a path | 0,1,misc                                           |  |  |  |  |  |  |
| Cluster group                                   | Cluster number of the node                      | -1∼256, misc                                       |  |  |  |  |  |  |
| Knowledge-infused and knowledge-rich approach   |                                                 |                                                    |  |  |  |  |  |  |
| DSP                                             | DSP used for this node?                         | binary/integer values, misc                        |  |  |  |  |  |  |
| LUT                                             | LUT used for this node?                         | binary/integer values, misc                        |  |  |  |  |  |  |
| FF                                              | FF used for this node?                          | binary/integer values, misc                        |  |  |  |  |  |  |

applications, respectively. The real-world applications are used for generalization evaluation of GNN models.

# 4 Modeling and Advancing with GNNs

In this section, we introduce the three proposed GNN-based approaches with trade-offs between timeliness and accuracy. GNNs operate by propagating information along the edges of a given graph. By stacking multiple GNN layers, each node can receive information from multi-hop neighbors and locally characterize the corresponding receptive field for node-level tasks. Graph pooling then summarizes global information to perform graph-level prediction tasks.

# 4.1 Modeling: Off-the-Shelf Approach with State-of-the-Art GNN Models

In the off-the-shelf approach, we screen several state-of-theart GNN models, aiming to identify (1) which properties of existing GNN models would help with resource/timing prediction and (2) how domain-specific insights can be combined with these properties to improve prediction accuracy. 14 different GNN models are selected from four categories based on how topological and relational information in graphs are exploited, which are briefly introduced as follows.

- Graph convolutional network (GCN) and variants: (1) GCN [12]; (2) GCN equipped with a virtual node [8]; (3) SGC [27], a simplified version of GCN; (4) GraphSage [9], a variant of GCN sampling a fixed number of neighboring nodes to keep the computational footprint consistent; (5) ARMA [3], a variant of GCN with auto-regressive moving average filters; (6) PAN [14], a generalization of GCN assigning trainable weights to each path based on its length. Previous work IronMan [28] also has a similar GCN-based performance predictor.
- Graph isomorphism network (GIN) and variants. (1)
   GIN [31], provably as powerful as Weisfeiler-Lehman graph
   isomorphism test; (2) GIN with a virtual node [8]; (3)
   PNA [5], leveraging complementary aggregators to better

- understand graph structures and retain neighborhood information, especially for a continuous input feature space.
- Employing multi-relational information. (1) GAT [24], using attention mechanisms to implicitly assign different importance to nodes in the same neighborhood; (2) GGNN [13], using trainable edge-dependent weights with gated recurrent units; (3) RGCN [22], using edge-dependent weight with non-linearity activation.
- Inspired from vision tasks. (1) Graph U-Net [7], using an encoder-decoder structure on graphs; (2) GNN-FiLM [4], combining feature-wise linear modulation (FiLM) with the message passing procedure.

To fairly evaluate these models, we use the same GNN structure (e.g., embedding, layer count) but with different types of GNN layers. The goal is to directly predict actual resource/timing based on IR graphs without invoking HLS. This approach makes earliest prediction since the HLS frontend compilation is the very first step of an EDA design flow. While with the best timeliness, the accuracy is compromised due to the ignorance of device-specific information.

# 4.2 Modeling: Knowledge Rich Approach with Selected GNN Models

To include device information revealed during the design flow, we devise the knowledge rich approach, which takes both IR graphs and auxiliary information from intermediate HLS results as inputs, as shown in Fig. 2(c). The auxiliary information from HLS tools indicates both the type(s) of resource and the exact number of each resource used in final implementation for every node in IR graphs. As each node is marked with pre-characterized resource estimations, GNN models pay more attention to resource interference/sharing among nodes, achieving much better prediction accuracy.

Armed with rich domain knowledge, this approach emphasizes more on prediction accuracy, especially for resource estimation, yet compromises timeliness since HLS tools do take some time to generate intermediate results.

# 4.3 Advancing: Knowledge Infused Approach with Hierarchical GNN Models

To strike a balance between timeliness and accuracy, we propose the knowledge infused approach with hierarchical GNN models. As depicted in Fig. 2(b), the resource/timing prediction task is disentangled into two sub-tasks: node-level classification that annotates resource types associated with each node, and graph-level regression that predicts actual resource/timing with the annotated graphs. During the hierarchical training, the node-level classification takes IR graphs as inputs, and the domain knowledge is infused by providing labels to each node that denote resource types used in final implementation based on HLS intermediate results; the graph-level regression then takes both IR graphs and ground-truth resource types as inputs, aiming to convey the infused

domain knowledge from node-level to graph-level tasks and to improve final prediction accuracy. During the hierarchical inference, the only required inputs are IR graphs: first, the node-level GNN model infers resource types for each node; second, combining the node-level inference results with original IR graphs, the graph-level regression grasps self-inferred domain knowledge to perform final predictions.

Taking advantages of knowledge infusion during training, this approach demonstrates a great balance between timeliness and accuracy: predicting resource/timing from the earliest stage and simultaneously adopting adequate domain information to improve prediction accuracy.

# 5 Experiment

### 5.1 Experimental Setup

All GNN models are implemented with Pytorch Geometric [6]. The ground-truth (actual) resource usage (LUT/DSP/FF) and CP timing are synthesized by Vitis HLS [25] and implemented by Vitis [30]. DFG and CDFG datasets are randomly split into 80% train, 10% validation and 10% test; realworld benchmarks are only used for generalization evaluation. Each GNN is empirically set as five layers with a hidden-dimension size of 300. For graph-level regression, sum or mean pooling is used to derive graph representations, followed by a feed-forward network with the structure 300-600-300-1. Models are trained using Adam optimizer for 100 epochs. Learning rates, dropout and other hyper-parameters are tuned on the validation set. Each model is trained with five runs using different random number seeds and we report the average of three with least validation error.

### 5.2 Modeling: SOTA GNN Analysis

We launch discussions of the off-the-shelf approach from three aspects: how different applications (i.e., graphs) influence prediction accuracy, which properties of existing GNN models would help improve accuracy, and what domainspecific insights can be derived to facilitate future graph representation learning on fast evaluation in EDA tasks.

Different graphs: DFG vs. CDFG. Table 2 exhibits mean absolute percentage error (MAPE) of predictions on DFGs and CDFGs from synthetic programs. The MAPE on CDFGs is larger than that on DFGs, which attributes to two major reasons. First, DFGs have no loops but CDFGs typically include a considerable number of loops. Since message-passing-based GNN models have limited expressiveness and are not better than the 1-Weisfeiler-Lehman isomorphism test [17], they are not excelled to handle graphs with many loops. Second, control signals introduce additional nodes/edges that represent control states and dependency, which are seemingly unrelated to resource usage. These nodes/edges easily confuse GNN models during resource prediction.

**Model analysis**. PNA and RGCN generally show superior performance, implying two takeaways. First, the relational

**Table 2.** MAPE of graph-level regression with different GNN models on DFG and CDFG datasets. The top two performant models are marked in bold.

|             | DFG    |                        |        |       | CDFG   |        |        |        |
|-------------|--------|------------------------|--------|-------|--------|--------|--------|--------|
|             | DSP    | LUT                    | FF     | CP    | DSP    | LUT    | FF     | CP     |
| GCN         | 16.31% | 16.49%                 | 21.27% | 6.12% | 25.30% | 28.64% | 38.34% | 8.79%  |
| GCN-V       | 15.72% | 15.93%                 | 21.64% | 6.36% | 17.31% | 33.93% | 39.94% | 8.13%  |
| SGC         | 42.12% | 23.93%                 | 30.61% | 7.92% | 44.01% | 60.87% | 53.50% | 10.32% |
| SAGE        | 15.18% | 14.01%                 | 17.11% | 6.12% | 17.01% | 28.09% | 39.11% | 8.25%  |
| ARMA        | 19.12% | 13.46%                 | 16.87% | 6.50% | 18.47% | 25.21% | 32.15% | 8.42%  |
| PAN         | 15.24% | 14.13%                 | 17.23% | 6.38% | 16.88% | 32.65% | 44.36% | 8.54%  |
| GIN         | 15.52% | 16.10%                 | 22.08% | 6.58% | 15.47% | 28.48% | 38.82% | 8.76%  |
| GIN-V       | 15.04% | 16.17%                 | 23.09% | 6.40% | 17.94% | 29.40% | 48.64% | 8.59%  |
| PNA         | 12.65% | 11.64%                 | 14.41% | 6.26% | 14.71% | 22.86% | 26.47% | 8.87%  |
| GAT         | 26.22% | 22.64%                 | 27.74% | 8.30% | 28.66% | 46.19% | 54.73% | 10.32% |
| <b>GGNN</b> | 15.40% | 13.64%                 | 16.94% | 6.47% | 16.28% | 28.05% | 31.88% | 8.50%  |
| RGCN        | 13.27% | 13.03%                 | 15.09% | 6.14% | 15.03% | 26.33% | 25.52% | 8.72%  |
| UNet        | 18.40% | 14.90%                 | 19.17% | 6.61% | 18.92% | 32.83% | 53.06% | 9.02%  |
| FiLM        | 20.05% | $\boldsymbol{12.50\%}$ | 16.94% | 6.27% | 17.42% | 26.97% | 27.35% | 8.67%  |

information (i.e., edge information) is important in IR graphs, since they represent data or control dependency, or a mix of both, which is a critical basis in logic synthesis and impacts resource allocation. Second, equipped with multiple aggregators, PNA is more powerful to characterize different neighborhood information, thus making better predictions. Domain-specific insights. (1) Resource. Among three types of resource, DSPs are mainly used for computation; FFs often relate to memory operations and small arrays; LUTs may appear in computation, memory or control nodes. The key to making precise DSP prediction is to distinguish major computation nodes that are most likely to use DSPs. For instance, a multiplication node with a large bitwidth tends to use DSPs, while divisions and bitwise operations prefer LUTs. Similarly, effective extraction of memory-related nodes would greatly benefit FF predictions. Since LUTs are involved in the entire graph (as computation units and glue logic to circuit components), graph-level understanding is important. To briefly summarize, it is helpful to carefully characterize neighborhood information from each node's predecessors, successors, itself, and their relations, such that the sophisticated mapping rules from heterogeneous nodes to resource usage can be clearly understood and quantitatively learned. (2) Timing. Compared with resource predictions, CP timing predictions show relatively lower MAPE and better consistency between DFGs and CDFGs. A probable reason is that CP timing is local information and thus is insensitive to graph sizes as long as the critical path segment can be recognized.

### 5.3 Advancing: Comparison of Three Approaches

We first discuss the results of the knowledge infused approach, and then comprehensively compare the three proposed approaches with HLS and prior work IronMan [28]. **Knowledge infused approach.** Essentially, using GNNs to predict actual resource/timing from IR graphs is to approximate the set of sophisticated heuristics and mapping

rules used by HLS scheduling/binding and logic/physical synthesis during design flow. The evaluation of the off-the-shelf approaches indicate that *plug-in application of GNNs cannot well approximate such underlying rules*. Thus, in addition to infusing domain knowledge during training, another motivation of the hierarchical structure in the knowledge infused approach is to *divide and conquer*. The complicated performance prediction task is decoupled as two simpler subtasks: for *node-level classification*, Table 3 shows prediction accuracy of classifying resource types, where high accuracy is achieved for most of the cases since local neighborhood characterization is enough for node-level resource type classification; for *graph-level regression*, Table 4 displays MAPE of predictions on synthetic programs, showing an obvious accuracy boost compared with the off-the-shelf approach.

With the hierarchical training, both the node-level and the graph-level GNN models in Fig. 2(b) are approximating simplified design heuristics. Specifically, the node-level classification aims to understand the preference of resource types on different nodes; the graph-level regression focuses on globally estimating resource sharing and interference among nodes. With the hierarchical inference, the domain knowledge infused during training can be self-inferred when encountering unseen designs, leading to improved prediction accuracy from the earliest design stage.

Accurate, timely, and generalizable. The three approaches explore different trade-offs between timeliness and accuracy. Intuitively, the more domain information is leveraged, the more accurate predictions are provided, whereas the longer time would be taken for feature collection. The off-the-shelf approach makes predictions from the earliest stage simply with IR graphs, at the cost of accuracy loss due to ignorance of domain knowledge. The knowledge rich approach provides the best prediction accuracy, yet has to wait for HLS tools providing intermediate results, sacrificing timeliness. The knowledge infused approach shows a balance: infusing adequate domain knowledge during training, and making predictions from the earliest stage during inference.

Generelization capability is a key indicator of whether an ML/GNN-based approach can be widely applied for certain EDA tasks. Table 5 shows MAPE of the three proposed approaches and Vitis HLS on real-case applications. Compared with Vitis HLS, our approaches significantly improve prediction accuracy especially for LUT/FF usage and CP timing. Specifically, PNA-based knowledge-infused approach outperforms HLS by 1.2× to 40.6×, while PNA-based knowledge-rich approach outperforms HLS by 1.7× to 51.4×. Note that since IronMan [28] is a variant of off-the-shelf GCN, whose performance is inferior to RGCN, the results in Table 5 imply that our proposed hierarchical GNN outperforms IronMan by at least 2.1× to 5.0×.

Such results empirically demonstrate (1) generalization capability not only from seen to unseen designs but also from

**Table 3.** Prediction accuracy of node-level resource classification with four different GNN models on DFGs, CDFGs and real-case applications.

|             | DFG    |        |        |        | CDFG   |        | Real Case |        |        |
|-------------|--------|--------|--------|--------|--------|--------|-----------|--------|--------|
|             | DSP    | LUT    | FF     | DSP    | LUT    | FF     | DSP       | LUT    | FF     |
| GCN         | 93.79% | 84.84% | 88.66% | 83.00% | 77.01% | 64.74% | 79.70%    | 81.83% | 86.82% |
| <b>SAGE</b> | 93.06% | 87.32% | 92.09% | 85.65% | 78.41% | 60.40% | 87.39%    | 86.44% | 55.88% |
| GIN         | 93.80% | 84.93% | 91.57% | 79.24% | 73.05% | 65.78% | 74.70%    | 75.53% | 72.24% |
| RGCN        | 93.91% | 87.13% | 91.52% | 85.80% | 78.46% | 68.92% | 90.82%    | 88.83% | 91.55% |

**Table 4.** MAPE of the three proposed approaches with RGCN/PNA on DFG and CDFG datasets. The default notation means the off-the-shelf approach; -I means the knowledge infused approach; -R means the knowledge rich approach.

|                   | DFG DSP LUT FF CP |        |        |       | CDFG   |        |        |       |
|-------------------|-------------------|--------|--------|-------|--------|--------|--------|-------|
|                   | DSP               | LUT    | FF     | CP    | DSP    | LUT    | FF     | CP    |
| RGCN <sup>1</sup> | 13.27%            | 13.03% | 15.09% | 6.14% | 15.03% | 26.33% | 25.52% | 8.72% |
| RGCN-I            | 10.60%            | 10.25% | 12.47% | 5.70% | 12.65% | 20.55% | 19.01% | 6.78% |
| RGCN-R            | 8.86%             | 8.58%  | 10.18% | 4.91% | 10.98% | 14.06% | 16.65% | 5.46% |
| PNA               | 12.65%            | 11.64% | 14.41% | 6.26% | 14.71% | 22.86% | 26.47% | 8.87% |
| PNA-I             |                   |        |        |       |        |        |        |       |
| PNA-R             | 7.06%             | 4.02%  | 5.78%  | 5.39% | 8.95%  | 10.27% | 11.22% | 5.81% |

**Table 5.** Testing MAPE of the three proposed approaches with RGCN/PNA on real-case applications.

|     | HLS     | RGCN   | RGCN-I | RGCN-R | PNA    | PNA-I  | PNA-R  |
|-----|---------|--------|--------|--------|--------|--------|--------|
| DSP | 26.07%  | 45.61% | 40.89% | 32.90% | 40.06% | 21.95% | 15.20% |
|     | 871.56% |        | 30.91% | 24.08% | 56.34% | 21.45% | 16.96% |
|     | 322.86% |        | 38.75% | 27.72% | 47.65% | 20.10% | 17.42% |
| CP  | 32.09%  | 8.13%  | 5.35%  | 5.83%  | 8.68%  | 4.80%  | 3.97%  |

synthetic to realistic applications, (2) accuracy and timeliness conspicuously surpassing HLS tools.

# 6 Conclusion

In this work, we discussed three approaches for early circuit performance prediction using GNNs: (1) the off-the-shelf approach, making earliest prediction with least domain-specific information, showing on-par performance with HLS; (2) the knowledge-rich approach, making late prediction after HLS with auxiliary information, showing significantly better performance than HLS; (3) the knowledge-infuse approach, making earliest prediction in a two-step hierarchical manner with self-inferred knowledge, still significantly outperforming HLS. We also constructed a standard benchmark suite for facilitating future researches. This work not only demonstrated the great potential of GNN in EDA, but also advanced the GNN design by proposing innovative architectures.

#### References

 A Aho et al. 2007. Compilers: Principles, Techniques and Tools. Addison wesley (2007).

<sup>&</sup>lt;sup>1</sup>An advanced version of IronMan by considering relational information.

- [2] Gergö Barany. 2017. Liveness-driven random program generation. In LOPSTR. Springer.
- [3] Filippo Maria Bianchi et al. 2021. Graph neural networks with convolutional arma filters. In TPAMI. IEEE.
- [4] Marc Brockschmidt. 2020. Gnn-film: Graph neural networks with feature-wise linear modulation. In ICML.
- [5] Gabriele Corso et al. 2020. Principal Neighbourhood Aggregation for Graph Nets. In NeurIPS.
- [6] Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. In arXiv preprint arXiv:1903.02428.
- [7] Hongyang Gao and Shuiwang Ji. 2019. Graph u-nets. In ICML.
- [8] Justin Gilmer et al. 2017. Neural message passing for quantum chemistry. In ICML.
- [9] William L Hamilton et al. 2017. Inductive representation learning on large graphs. In *NeurIPS*.
- [10] Yuko Hara et al. 2009. Proposal and quantitative analysis of the CH-Stone benchmark program suite for practical C-based high-level synthesis. *IIP* (2009).
- [11] John L Hennessy and David A Patterson. 2011. *Computer architecture:* a quantitative approach. Elsevier.
- [12] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In  $\it ICLR$ .
- [13] Yujia Li et al. 2016. Gated graph sequence neural networks. In ICLR.
- [14] Zheng Ma et al. 2020. Path integral based convolution and pooling for graph neural networks. In NeurIPS.
- [15] Hosein M. Makrani et al. 2019. Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis Design. In FPL.
- [16] Hosein M. Makrani et al. 2019. XPPE: cross-platform performance estimation of hardware accelerators using machine learning. In ASPDAC.
- [17] Haggai Maron et al. 2019. Provably Powerful Graph Networks. In NeurIPS.
- [18] Kenneth O'Neal et al. 2018. Hlspredict: Cross platform performance prediction for fpga high-level synthesis. In ICCAD.
- [19] André Bannwart Perina et al. 2019. Lina: Timing-Constrained High-Level Synthesis Performance Estimator for Fast DSE. In ICFPT. IEEE.
- [20] Louis-Noël Pouchet and Tomofumi Yuki. 2016. Polyhedral Benchmark suite. http://web.cs.ucla.edu/~pouchet/software/polybench/.
- [21] Brandon Reagen et al. 2014. MachSuite: Benchmarks for Accelerator Design and Customized Architectures. In IISWC.
- [22] Michael Schlichtkrull et al. 2018. Modeling relational data with graph convolutional networks. In ESWC. Springer.
- [23] Ecenur Ustun et al. 2020. Accurate operation delay prediction for FPGA HLS using graph neural networks. In ICCAD.
- [24] Petar Veličković et al. 2017. Graph attention networks. In arXiv preprint arXiv:1710.10903.
- [25] Vitis. Accessed: 2021. Vitis High-Level Synthesis User Guide (UG1399). https://docs.xilinx.com/r/en-US/ug1399-vitis-hls.
- [26] Marilyn Wolf. 2012. Computers as components: principles of embedded computing system design. Elsevier.
- [27] Felix Wu et al. 2019. Simplifying graph convolutional networks. In ICMI.
- [28] Nan Wu et al. 2021. IronMan: GNN-assisted Design Space Exploration in High-Level Synthesis via Reinforcement Learning. In GLSVLSI.
- [29] Nan Wu and Yuan Xie. 2021. A Survey of Machine Learning for Computer Architecture and Systems. arXiv preprint arXiv:2102.07952 (2021).
- [30] Xilinx. Accessed: 2021. Xilinx Vitis unified software platform. https://www.xilinx.com/products/design-tools/vitis.html.
- [31] Keyulu Xu et al. 2019. How powerful are graph neural networks?. In ICLR.
- [32] Jieru Zhao et al. 2017. COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications. In ICCAD.

[33] Jieru Zhao et al. 2019. Performance modeling and directives optimization for high-level synthesis on FPGA. TCAD (2019).